A Quantitative Study of German Compound Nouns and their Polish Equivalents
نویسندگان
چکیده
We present preliminary results of a quantitative, contrastive study of the structural and semantic relations that arise between German compound nouns (GCNs) and their Polish equivalents. The main premise has been to use only automatic tools for the collection of the data as well as for the analysis of the collected parallel pairs. We show that interesting linguistic insights can be obtained despite of the use of generally error-prone methods as long as the analysed data set is large enough and as long as it remains large even after various filtering techniques have been applied. The topic of GCNs and their Polish equivalents has been investigated before, the most extensive study so far being Jeziorski (1983). Jeziorski relies on a set of roughly 3000 GCNs and corresponding Polish phrases that where manually extracted from handbooks on German word-formation and bilingual German-Polish dictionaries. Mainly aspects of word formation and syntax are contrasted, semantic aspects have not been investigated. Our study is based on a large parallel corpus — the German-Polish part of the third release of the JRC-Acquis parallel corpus (Steinberger et al. 2006) which is a subset of the body of law of the European Union ranging from 1956 to 2006. We collected 2,163,620 GCN tokens which correspond to 144,207 GCN types. For about 50,000 compound noun types we were able to identify their Polish equivalents with an precision of approximately 93% using statistical alignment models and additional linguistical knowledge for filtering. We describe our methods of automatic data analysis for both languages, including splitting and semantic interpretation of the GCNs, syntactic parsing of the Polish equivalents, bracketing of GCNs using structural evidence from their Polish counterparts, and mutual semantic disambiguation using a parallel German-Polish thesaurus. All methods of analysis are evaluated against a manually annotated test set. An overview of structural patterns identified for both language is given. We contrast part-of-speech structures of the GCN segments, bracketing structures and automatically identified semantic relations between the GCN segments with the syntactic and semantic structure of their Polish equivalents. All results are ordered by statistical significance. The impact of errors introduced due to the application of unsupervised methods is discussed and examples how these error can be minimised using evidence from parallel data are given. The perhaps greatest advantage of using fully automatic methods — once they have been developed and tested — is the ease of reapplying them to other …
منابع مشابه
Application of Proper Nouns as Terms of Address in Russian Compared to their Persian Equivalents
This study delved into the application of proper nouns as terms of address in Russian and Persian. In other words, it examined the rules governing the application of terms of address expressed as the names of individuals in different speech situations in both languages. The comparative study of the cultural features of languages spoken by Russians and Iranians called for the investigation of th...
متن کاملAn Analysis of Persian Compound Nouns as Constructions
In Construction Morphology (CM), a compound is treated as a construction at the word level with a systematic correlation between its form and meaning, in the sense that any change in the form is accompanied by a change in the meaning. Compound words are coined by compounding templates which are called abstract schemas in CM. These abstract constructional schemas generalize over sets of existing...
متن کاملTranslation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan
Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...
متن کاملTranslation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan
Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...
متن کاملAssociation norms for German noun compounds and their constituents.
We present a collection of association norms for 246 German depictable compound nouns and their constituents, comprising 58,652 association tokens distributed over 26,004 stimulus-associate pair types. Analyses of the data revealed that participants mainly provided noun associates, followed by adjective and verb associates. In corpus analyses, co-occurrence values for compounds and their associ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008